
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="zh_CN">
  <head>
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>XML处理模块 &#8212; Python 3.7.3 文档</title>
    <link rel="stylesheet" href="../_static/pydoctheme.css" type="text/css" />
    <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
    
    <script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
    <script type="text/javascript" src="../_static/jquery.js"></script>
    <script type="text/javascript" src="../_static/underscore.js"></script>
    <script type="text/javascript" src="../_static/doctools.js"></script>
    <script type="text/javascript" src="../_static/language_data.js"></script>
    <script type="text/javascript" src="../_static/translations.js"></script>
    
    <script type="text/javascript" src="../_static/sidebar.js"></script>
    
    <link rel="search" type="application/opensearchdescription+xml"
          title="在 Python 3.7.3 文档 中搜索"
          href="../_static/opensearch.xml"/>
    <link rel="author" title="关于这些文档" href="../about.html" />
    <link rel="index" title="索引" href="../genindex.html" />
    <link rel="search" title="搜索" href="../search.html" />
    <link rel="copyright" title="版权所有" href="../copyright.html" />
    <link rel="next" title="xml.etree.ElementTree --- The ElementTree XML API" href="xml.etree.elementtree.html" />
    <link rel="prev" title="html.entities --- HTML 一般实体的定义" href="html.entities.html" />
    <link rel="shortcut icon" type="image/png" href="../_static/py.png" />
    <link rel="canonical" href="https://docs.python.org/3/library/xml.html" />
    
    <script type="text/javascript" src="../_static/copybutton.js"></script>
    <script type="text/javascript" src="../_static/switchers.js"></script>
    
    
    
    <style>
      @media only screen {
        table.full-width-table {
            width: 100%;
        }
      }
    </style>
 

  </head><body>  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             accesskey="I">索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="xml.etree.elementtree.html" title="xml.etree.ElementTree --- The ElementTree XML API"
             accesskey="N">下一页</a> |</li>
        <li class="right" >
          <a href="html.entities.html" title="html.entities --- HTML 一般实体的定义"
             accesskey="P">上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <span class="language_switcher_placeholder">zh_CN</span>
          <span class="version_switcher_placeholder">3.7.3</span>
          <a href="../index.html">文档</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="markup.html" accesskey="U">结构化标记处理工具</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>    

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <div class="section" id="module-xml">
<span id="xml-processing-modules"></span><span id="xml"></span><h1>XML处理模块<a class="headerlink" href="#module-xml" title="永久链接至标题">¶</a></h1>
<p><strong>源码：</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.7/Lib/xml/">Lib/xml/</a></p>
<hr class="docutils" />
<p>用于处理XML的Python接口分组在 <code class="docutils literal notranslate"><span class="pre">xml</span></code> 包中。</p>
<div class="admonition warning">
<p class="first admonition-title">警告</p>
<p class="last">XML 模块对于错误或恶意构造的数据是不安全的。 如果需要解析不受信任或未经身份验证的数据，请参阅 <a class="reference internal" href="#xml-vulnerabilities"><span class="std std-ref">XML 漏洞</span></a> 和 <a class="reference internal" href="#defused-packages"><span class="std std-ref">defusedxml 和 defusedexpat 软件包</span></a> 部分。</p>
</div>
<p>值得注意的是 <a class="reference internal" href="#module-xml" title="xml: Package containing XML processing modules"><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml</span></code></a> 包中的模块要求至少有一个 SAX 兼容的 XML 解析器可用。在 Pythonm中包含 Expat 解析器，因此 <a class="reference internal" href="pyexpat.html#module-xml.parsers.expat" title="xml.parsers.expat: An interface to the Expat non-validating XML parser."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.parsers.expat</span></code></a> 模块将始终可用。</p>
<p><a class="reference internal" href="xml.dom.html#module-xml.dom" title="xml.dom: Document Object Model API for Python."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom</span></code></a> 和 <a class="reference internal" href="xml.sax.html#module-xml.sax" title="xml.sax: Package containing SAX2 base classes and convenience functions."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.sax</span></code></a> 包的文档是 DOM 和 SAX 接口的 Python 绑定的定义。</p>
<p>XML 处理子模块包括:</p>
<ul class="simple">
<li><a class="reference internal" href="xml.etree.elementtree.html#module-xml.etree.ElementTree" title="xml.etree.ElementTree: Implementation of the ElementTree API."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.etree.ElementTree</span></code></a>： ElementTree API，一个简单而轻量级的XML处理器</li>
</ul>
<ul class="simple">
<li><a class="reference internal" href="xml.dom.html#module-xml.dom" title="xml.dom: Document Object Model API for Python."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom</span></code></a>：DOM API 定义</li>
<li><a class="reference internal" href="xml.dom.minidom.html#module-xml.dom.minidom" title="xml.dom.minidom: Minimal Document Object Model (DOM) implementation."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.minidom</span></code></a>：最小的 DOM 实现</li>
<li><a class="reference internal" href="xml.dom.pulldom.html#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a>：支持构建部分 DOM 树</li>
</ul>
<ul class="simple">
<li><a class="reference internal" href="xml.sax.html#module-xml.sax" title="xml.sax: Package containing SAX2 base classes and convenience functions."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.sax</span></code></a>：SAX2 基类和便利函数</li>
<li><a class="reference internal" href="pyexpat.html#module-xml.parsers.expat" title="xml.parsers.expat: An interface to the Expat non-validating XML parser."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.parsers.expat</span></code></a>：Expat解析器绑定</li>
</ul>
<div class="section" id="xml-vulnerabilities">
<span id="id1"></span><h2>XML 漏洞<a class="headerlink" href="#xml-vulnerabilities" title="永久链接至标题">¶</a></h2>
<p>XML 处理模块对于恶意构造的数据是不安全的。 攻击者可能滥用 XML 功能来执行拒绝服务攻击、访问本地文件、生成与其它计算机的网络连接或绕过防火墙。</p>
<p>下表概述了已知的攻击以及各种模块是否容易受到攻击。</p>
<table border="1" class="docutils">
<colgroup>
<col width="26%" />
<col width="15%" />
<col width="16%" />
<col width="15%" />
<col width="15%" />
<col width="15%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">种类</th>
<th class="head">sax</th>
<th class="head">etree</th>
<th class="head">minidom</th>
<th class="head">pulldom</th>
<th class="head">xmlrpc</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>billion laughs</td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
</tr>
<tr class="row-odd"><td>quadratic blowup</td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
<td><strong>易受攻击</strong></td>
</tr>
<tr class="row-even"><td>external entity expansion</td>
<td>安全 (4)</td>
<td>安全 (1)</td>
<td>安全 (2)</td>
<td>安全 (4)</td>
<td>安全 (3)</td>
</tr>
<tr class="row-odd"><td><a class="reference external" href="https://en.wikipedia.org/wiki/Document_type_definition">DTD</a> retrieval</td>
<td>安全 (4)</td>
<td>安全</td>
<td>安全</td>
<td>安全 (4)</td>
<td>安全</td>
</tr>
<tr class="row-even"><td>decompression bomb</td>
<td>安全</td>
<td>安全</td>
<td>安全</td>
<td>安全</td>
<td><strong>易受攻击</strong></td>
</tr>
</tbody>
</table>
<ol class="arabic simple">
<li><a class="reference internal" href="xml.etree.elementtree.html#module-xml.etree.ElementTree" title="xml.etree.ElementTree: Implementation of the ElementTree API."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.etree.ElementTree</span></code></a> 不会扩展外部实体并在实体发生时引发 <code class="xref py py-exc docutils literal notranslate"><span class="pre">ParserError</span></code>。</li>
<li><a class="reference internal" href="xml.dom.minidom.html#module-xml.dom.minidom" title="xml.dom.minidom: Minimal Document Object Model (DOM) implementation."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.minidom</span></code></a> 不会扩展外部实体，只是简单地返回未扩展的实体。</li>
<li><code class="xref py py-mod docutils literal notranslate"><span class="pre">xmlrpclib</span></code> 不扩展外部实体并省略它们。</li>
<li>从 Python 3.7.1 开始，默认情况下不再处理外部通用实体。</li>
</ol>
<dl class="docutils">
<dt>billion laughs / exponential entity expansion （狂笑/递归实体扩展）</dt>
<dd><a class="reference external" href="https://en.wikipedia.org/wiki/Billion_laughs">Billion Laughs</a> 攻击 -- 也称为递归实体扩展 -- 使用多级嵌套实体。 每个实体多次引用另一个实体，最终实体定义包含一个小字符串。 指数级扩展导致几千 GB 的文本，并消耗大量内存和 CPU 时间。</dd>
<dt>quadratic blowup entity expansion（二次爆炸实体扩展）</dt>
<dd>二次爆炸攻击类似于 <a class="reference external" href="https://en.wikipedia.org/wiki/Billion_laughs">Billion Laughs</a> 攻击，它也滥用实体扩展。 它不是嵌套实体，而是一遍又一遍地重复一个具有几千个字符的大型实体。攻击不如递归情况有效，但它避免触发禁止深度嵌套实体的解析器对策。</dd>
<dt>external entity expansion</dt>
<dd>实体声明可以包含的不仅仅是替换文本。 它们还可以指向外部资源或本地文件。 XML 解析器访问资源并将内容嵌入到 XML 文档中。</dd>
<dt><a class="reference external" href="https://en.wikipedia.org/wiki/Document_type_definition">DTD</a> retrieval</dt>
<dd>Python 的一些 XML 库 <a class="reference internal" href="xml.dom.pulldom.html#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code class="xref py py-mod docutils literal notranslate"><span class="pre">xml.dom.pulldom</span></code></a> 从远程或本地位置检索文档类型定义。 该功能与外部实体扩展问题具有相似的含义。</dd>
<dt>decompression bomb</dt>
<dd>Decompression bombs（解压炸弹，又名 <a class="reference external" href="https://en.wikipedia.org/wiki/Zip_bomb">ZIP bomb</a>）适用于所有可以解析压缩 XML 流（例如 gzip 压缩的 HTTP 流或 LZMA 压缩的文件）的 XML 库。 对于攻击者来说，它可以将传输的数据量减少三个量级或更多。</dd>
</dl>
<p>PyPI上 <a class="reference external" href="https://pypi.org/project/defusedxml/">defusedxml</a> 的文档包含有关所有已知攻击向量的更多信息以及示例和参考。</p>
</div>
<div class="section" id="the-defusedxml-and-defusedexpat-packages">
<span id="defused-packages"></span><h2><code class="xref py py-mod docutils literal notranslate"><span class="pre">defusedxml</span></code> 和 <code class="xref py py-mod docutils literal notranslate"><span class="pre">defusedexpat</span></code> 软件包<a class="headerlink" href="#the-defusedxml-and-defusedexpat-packages" title="永久链接至标题">¶</a></h2>
<p><a class="reference external" href="https://pypi.org/project/defusedxml/">defusedxml</a> 是一个纯 Python 软件包，它修改了所有标准库 XML 解析器的子类，可以防止任何潜在的恶意操作。 对于解析不受信任的XML数据的任何服务器代码，建议使用此程序包。 该软件包还提供了有关更多 XML 漏洞（如 XPath 注入）的示例漏洞和扩展文档。</p>
<p><a class="reference external" href="https://pypi.org/project/defusedexpat/">defusedexpat</a> 提供了一个修改过的 libexpat 和一个打过补丁的 <code class="xref py py-mod docutils literal notranslate"><span class="pre">pyexpat</span></code> 模块，它有针对实体扩展DoS攻击的对策。 <code class="xref py py-mod docutils literal notranslate"><span class="pre">defusedexpat</span></code> 模块仍然允许合理且可配置的实体扩展量。 这些修改可能包含在 Python 的某些未来版本中，但不会包含在 Python 的任何修复版本中，因为它们会破坏向后兼容性。</p>
</div>
</div>


          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h3><a href="../contents.html">Table of Contents</a></h3>
  <ul>
<li><a class="reference internal" href="#">XML处理模块</a><ul>
<li><a class="reference internal" href="#xml-vulnerabilities">XML 漏洞</a></li>
<li><a class="reference internal" href="#the-defusedxml-and-defusedexpat-packages"><code class="docutils literal notranslate"><span class="pre">defusedxml</span></code> 和 <code class="docutils literal notranslate"><span class="pre">defusedexpat</span></code> 软件包</a></li>
</ul>
</li>
</ul>

  <h4>上一个主题</h4>
  <p class="topless"><a href="html.entities.html"
                        title="上一章"><code class="docutils literal notranslate"><span class="pre">html.entities</span></code> --- HTML 一般实体的定义</a></p>
  <h4>下一个主题</h4>
  <p class="topless"><a href="xml.etree.elementtree.html"
                        title="下一章"><code class="docutils literal notranslate"><span class="pre">xml.etree.ElementTree</span></code> --- The ElementTree XML API</a></p>
  <div role="note" aria-label="source link">
    <h3>本页</h3>
    <ul class="this-page-menu">
      <li><a href="../bugs.html">提交 Bug</a></li>
      <li>
        <a href="https://github.com/python/cpython/blob/3.7/Doc/library/xml.rst"
            rel="nofollow">显示源代码
        </a>
      </li>
    </ul>
  </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>  
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>导航</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="总目录"
             >索引</a></li>
        <li class="right" >
          <a href="../py-modindex.html" title="Python 模块索引"
             >模块</a> |</li>
        <li class="right" >
          <a href="xml.etree.elementtree.html" title="xml.etree.ElementTree --- The ElementTree XML API"
             >下一页</a> |</li>
        <li class="right" >
          <a href="html.entities.html" title="html.entities --- HTML 一般实体的定义"
             >上一页</a> |</li>
        <li><img src="../_static/py.png" alt=""
                 style="vertical-align: middle; margin-top: -1px"/></li>
        <li><a href="https://www.python.org/">Python</a> &#187;</li>
        <li>
          <span class="language_switcher_placeholder">zh_CN</span>
          <span class="version_switcher_placeholder">3.7.3</span>
          <a href="../index.html">文档</a> &#187;
        </li>

          <li class="nav-item nav-item-1"><a href="index.html" >Python 标准库</a> &#187;</li>
          <li class="nav-item nav-item-2"><a href="markup.html" >结构化标记处理工具</a> &#187;</li>
    <li class="right">
        

    <div class="inline-search" style="display: none" role="search">
        <form class="inline-search" action="../search.html" method="get">
          <input placeholder="快速搜索" type="text" name="q" />
          <input type="submit" value="转向" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </form>
    </div>
    <script type="text/javascript">$('.inline-search').show(0);</script>
         |
    </li>

      </ul>
    </div>  
    <div class="footer">
    &copy; <a href="../copyright.html">版权所有</a> 2001-2019, Python Software Foundation.
    <br />
    Python 软件基金会是一个非盈利组织。
    <a href="https://www.python.org/psf/donations/">请捐助。</a>
    <br />
    最后更新于 4月 09, 2019.
    <a href="../bugs.html">发现了问题</a>？
    <br />
    使用<a href="http://sphinx.pocoo.org/">Sphinx</a>1.8.4 创建。
    </div>

  </body>
</html>